Performance and Scalability of Indexed Subgraph Query Processing Methods

نویسندگان

  • Foteini Katsarou
  • Nikos Ntarmos
  • Peter Triantafillou
چکیده

Graph data management systems have become very popular as graphs are the natural data model for many applications. One of the main problems addressed by these systems is subgraph query processing; i.e., given a query graph, return all graphs that contain the query. The naive method for processing such queries is to perform a subgraph isomorphism test against each graph in the dataset. This obviously does not scale, as subgraph isomorphism is NP-Complete. Thus, many indexing methods have been proposed to reduce the number of candidate graphs that have to underpass the subgraph isomorphism test. In this paper, we identify a set of key factors-parameters, that influence the performance of related methods: namely, the number of nodes per graph, the graph density, the number of distinct labels, the number of graphs in the dataset, and the query graph size. We then conduct comprehensive and systematic experiments that analyze the sensitivity of the various methods on the values of the key parameters. Our aims are twofold: first to derive conclusions about the algorithms’ relative performance, and, second, to stress-test all algorithms, deriving insights as to their scalability, and highlight how both performance and scalability depend on the above factors. We choose six wellestablished indexing methods, namely Grapes, CT-Index, GraphGrepSX, gIndex, Tree+∆, and gCode, as representative approaches of the overall design space, including the most recent and best performing methods. We report on their index construction time and index size, and on query processing performance in terms of time and false positive ratio. We employ both real and synthetic datasets. Specifically, four real datasets of different characteristics are used: AIDS, PDBS, PCM, and PPI. In addition, we generate a large number of synthetic graph datasets, empowering us to systematically study the algorithms’ performance and scalability versus the aforementioned key parameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GiS: Fast Indexing and Querying of Graph Structures

We propose a new way of indexing a large database of graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather that decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset, and each signature is the...

متن کامل

Processing SPARQL Queries Over Linked Data-A Distributed Graph-based Approach

We propose techniques for processing SPARQL queries over a large RDF graph in a distributed environment. We adopt a “partial evaluation and assembly” framework. Answering a SPARQL query Q is equivalent to finding subgraph matches of the query graph Q over RDF graph G. Based on properties of subgraph matching over a distributed graph, we introduce local partial match as partial answers in each f...

متن کامل

Indexing Query Graphs to Speedup Graph Query Processing

Subgraph/supergraph queries although central to graph analytics, are costly as they entail the NP-Complete problem of subgraph isomorphism. We present a fresh solution, the novel principle of which is to acquire and utilize knowledge from the results of previously executed queries. Our approach, iGQ, encompasses two component subindexes to identify if a new query is a subgraph/supergraph of pre...

متن کامل

On Graph Query Optimization in Large Networks

The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such large-scale graph-structured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the ...

متن کامل

GChord: Indexing for Multi-Attribute Query in P2P System with Low Maintenance Cost

To provide complex query processing in peer-to-peer systems has attracted much attention in both academic and industrial community. We present GChord, a scalable technique for evaluating queries with multi-attributes. Both exact match and range queries can be handled by GChord. It has advantages over existing methods in that each tuple only needs to be indexed once, while the query efficiency i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015